Published online 23 October 2013 



Nucleic Acids Research, 2014, Vol. 42, Database issue D503-D509 

doi:10.1093/}tar/gkt953 



MEROPS: the database of proteolytic enzymes, their 
substrates and inhibitors 

Neil D. Rawlings^'^'*, Matthew Waller\ Alan J. Barrett^'^ and Alex Bateman^ 

^The Wellcome Trust Sanger Institute, Wellcome Trust Genome Campus, Hinxton, Cambridgeshire CB10 ISA, 
UK and ^Proteins and Protein Families, EMBO European Bioinformatics Institute, Wellcome Trust Genome 
Campus, Hinxton, Cambridgeshire CB10 1SD, UK 

Received September 3, 2013; Revised September 25, 2013; Accepted September 26, 2013 



ABSTRACT 

Peptidases, their substrates and inhibitors are of 
great relevance to biology, medicine and biotech- 
nology. The MEROPS database (http://merops. 
sanger.ac.uk) aims to fulfill the need for an 
integrated source of information about these. The 
database has hierarchical classifications in which 
homologous sets of peptidases and protein inhibi- 
tors are grouped into protein species, which are 
grouped into families, which are in turn grouped 
into clans. Recent developments include the follow- 
ing. A community annotation project has been 
instigated in which acknowledged experts are 
invited to contribute summaries for peptidases. 
Software has been written to provide an Internet- 
based data entry form. Contributors are 
acknowledged on the relevant web page. A new 
display showing the intron/exon structures of eu- 
karyote peptidase genes and the phasing of the 
junctions has been implemented. It is now 
possible to filter the list of peptidases from a com- 
pletely sequenced bacterial genome for a particular 
strain of the organism. The MEROPS filing pipeline 
has been altered to circumvent the restrictions 
imposed on non-interactive blastp searches, and a 
HMMER search using specially generated align- 
ments to maximize the distribution of organisms 
returned in the search results has been added. 

INTRODUCTION 

The MEROPS database is a manually curated informa- 
tion resource for proteolytic enzymes [For simplicity, we 
here use the term 'peptidase' for any proteolytic enzyme, 
although a few of them are not peptidases in the strictest 
sense because they are lyases and not hydrolases (1)], their 
inhibitors and substrates. The database can be found at 
http://merops.sanger.ac.uk. The organizational principle 



of the database is a hierarchical classification in which 
homologous sets of peptidase and protein inhibitor se- 
quences are grouped into peptidase and inhibitor species, 
which are in turn grouped into families, which are grouped 
into clans. A family contains related sequences, and a clan 
contains related structures. Sequence analysis is restricted 
to that portion of the protein directly responsible for pep- 
tidase or inhibitor activity, which is termed the 'peptidase 
unit' or the 'inhibitor unit', respectively. A peptidase or 
inhibitor unit normally corresponds to a structural 
domain, and some proteins contain more than one pep- 
tidase or inhibitor domain. Examples are potato virus Y 
polyprotein, which contains three peptidase units, each in 
a different family, and turkey ovomucoid, which contains 
three inhibitor units all in the same family. At every level 
in the database a well-characterized type example is 
chosen, to which aU other members of the family or clan 
must be shown to be related in a statistically significant 
manner. The type example at the peptidase or inhibitor 
level is termed the 'holotype' (2,3). There are usually three 
releases of the MEROPS database per year. 

The sequence of family names is not consecutive 
because some families have been removed from the 
database. The most frequent reason why a family is 
removed is because a sequence relationship has been dis- 
covered to another family in the database. When the 
families are merged, the family name with the lowest 
number is retained and the one with the highest number 
is marked as deleted. A family may also be removed if 
experimentation has shown that the activity is not that 
of a peptidase. When a family is removed, the family 
name is not reassigned. A bookmarked link to a deleted 
family will either be automatically redirected to the new 
family name (or MEROPS identifier) or a message will 
appear to state that the family is no longer included in 
the database. 

Statistics from release 9.9 (August 2013) of MEROPS 
are shown in Table 1 and compared with release 9.5 
from July 2011. Counts of substrate cleavages, peptid- 
ase-inhibitor interactions and references are shown in 
Table 2. 



*To whom correspondence should be addressed. Teh +44 1223 494525; Fax: +44 1223 494468; Email: ndr(a sanger.ac.uk 
© The Author(s) 2013. Pubhshed by Oxford University Press. 

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.Org/licenses/by/3.0/), which 
permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited. 



D504 Nucleic Acids Research, 2014, Vol. 42, Database issue 



Table 1. Counts of protein species, families and clans for proteolytic enzymes and protein inhibitors in the MEROPS 
database 



MEROPS 9.5 MEROPS 9.9 





Peptidases 


Inhibitors 


Peptidases 


Inhibitors 


Sequences 


192053 


17451 


413 834 


28 502 


Identifiers 










Experimentally characterized and sequenced 


2308 


518 


2438 


542 


Hypothetical from model organisms 


1250 


0 


1362 


0 


Not active as peptidase or inhibitor 


298 


117 


327 


115 


Experimentally characterized but unsequenced 


145 


0 


148 


0 


Pseudogenes 


70 


0 


70 


0 


Compound and complex proteins 


15 


52 


16 


49 


Total 


4086 


687 


4361 


706 


Families 


225 


71 


244 


76 


Clans 


44 


34 


55 


39 



The numbers in Release 9.9 of MEROPS (August 2013) are compared with those in Release 9.5 of MEROPS (July 2011). 
A peptidase is referred to as 'unsequenced' when no sequence is known, or the known sequence fragments are insufficient to be 
able to assign the peptidase to a family 



Table 2. Information in the MEROPS database 



MEROPS 9.5 MEROPS 9.9 



Substrate cleavages: total 


54838 


64022 


Substrate cleavages: physiological 


18 280 


20 591 


Substrate cleavages: non-physiological 


28 376 


35 897 


Substrate cleavages: pathological 


990 


1166 


Substrate cleavages: synthetic substrates 


4229 


4906 


Peptidase-inhibitor interactions: total 


4017 


4485 


Peptidase-inhibitor interactions: proteins 


1220 


1304 


Peptidase-inhibitor interactions: SMI 


2373 


2562 


References 


43 497 


52600 



Substrate cleavage totals do not include cleavages derived only from 
the SwissProt database (mainly removal of initiating methionines and 
signal peptides). A naturally occurring cleavage is described as 'physio- 
logical' when the peptidase and substrate are froin the same organism 
and 'pathological' if the organisms differ and are pathogen and host. 
More than half of the cleavage positions in the MEROPS collection 
have been identified by mass spectroscopy, of which over 4800 cleav- 
ages were obtained from the PRIDE database (4) and over 3100 from 
the TOPPR database (5). Over 3300 cleavages were derived from the 
CutDB database (6). Molecular Connections (Bangalore, India) have 
provided over 10000 cleavages collected from the Hterature. How these 
data have been annotated has been described previously (7) 



Finding homologues 

To find homologues for a family we have performed 
blastp searches (8), usually using the non-interactive 
facihties at the National Center for Biotechnology 
Information (NCBI), searching the non-redundant 
protein sequence database (9). However, a number of 
families have now exceeded 10000 homologues, which is 
the maximum number returned from a blastp search at 
NCBI. These include the families C26 (the family of 
gamma-glutamyl hydrolase), C44 (amidophosphoribosyl- 
transferase precursor). Ml 6 (pitrilysin), M20 (glutamate 
carboxypeptidase), M23 (beta-lytic metaUopeptidase), 
M24 (methionyl aminopeptidase), SI (chymotrypsin), S9 
(prolyl ohgopeptidase) and S33 (prolyl aminopeptidase). 
Some of these families have exceeded 20 000 homologues 
(C26, SI and S9), and family S12 (o-Ala-D-Ala 



carboxypeptidase B) is approaching 10000 homologues. 
The reasons why a family contains so many homologues 
vary, for example, methionyl aminopeptidase removes the 
initiating methionine from cytoplasmic proteins and is 
present in every genome so far sequenced; there have 
been numerous gene duplications in vertebrates and 
insects for family SI (the human genome contains 186 
homologues, and Drosophila melanogaster 307 homo- 
logues). Some families contain relatively few peptidases 
and many homologues that are termed 'non-peptidase 
homologues'; for example, family S9 contains 5780 homo- 
logues that are not peptidases, usually because one of the 
active site residues has been replaced, but are other kinds 
of enzyme that have the 'a/p hydrolase' fold, such as 
lipases, carboxylesterases and esterases. 

To keep the peptidase and peptidase inhibitor famihes 
up-to-date with current genome sequencing projects, an 
addition to blastp searches was sought. For release 9.9, 
a second search was performed: the sequence filing 
pipeUne (10,11) was modified so that the initial blastp 
search was replaced by a search of the NCBI non- 
redundant protein sequence database using HMMER as 
implemented at Janeha Farm, Howard Hughes Medical 
Institute (http://hmmer.janelia.org/) (12). HMMER 
searches allow submission of a sequence afignment, and 
for this purpose special ahgnments were generated for 
each family and subfamily in MEROPS. 

Because we wished to find homologues from the widest 
range of organisms possible, we generated a special ahgn- 
ment by selecting an example from every phylum that is 
represented in a peptidase family or subfamily. Where 
possible, sequences from different MEROPS identifiers, 
thus representing different peptidase species (11), were 
used. For example, the ahgnment for subfamily AlA con- 
tained homologues from 12 different phyla (see Table 3). 
So that the HMMER search can be repeated by others, 
the sequences used for each family or subfamily are 
flagged in the MySQL database, which can be down- 
loaded from our FTP site. Each alignment was generated 
using ClustalX (13). 



Nucleic Acids Research, 2014, Vol. 42, Database issue D505 



Table 3. Example of sequences used in an alignment submitted to the HMMER server 



Orgs, ni Sill 


Phylum 


MEROPS identifier 


Accession 


Residue range 


\-\ 1 iiTi ct n 
1 ± Llllla.ll 


V^llUl Ucl icl 


AOl 070 


B4DVY9 


63-388 


Dm c/1 nn iln inplnnncrn c/i^r 

/ Lf.ylJlJI 1 1 ILl f f IL ILII I t-'/i t/i3 r L / 


A rtli rnn aH a 

Til Llllv7lJ^JVJ.£l 


AOl Afifi 


Q9VEK4 


51-370 


iJwL Lt/g lUAAU^ rt-tl IV lilt: KiJ/v it 


I-TpiTiir'n r»rn 'Aid 

1 ICllllV^llUl Llcl Ld. 


AOl not) 




55-386 


ijl 1 UflX yllJLt.llll (J I till IJLll IJLII til LiA 


1-i'r'n 1 ti i~iHf»rm tii 

±jV_llliHJLlCl illcl Itl 


Am nqfi 


yvi / ovj J J J 


66-310 


K^LlUllcllLl L LIU 1 1 LI 1 11 


A n n p 1 1 H 11 

jTVII 1 1 1 llLl Cl 


Ani not) 




1 2-343 


1 /'ion fivti /'tht/i 1 1 J ^ o 1 o(T/'ivi c 

HkIIUI flUULlll (i) c /t: t ti/iil 


Mpin 11 1 r\c\ a 

i > Clltd. LUUcl 


A01.A73 


rARfi09n 


56-320 


kjL It I L/i.1 Ur ULl lltHlli^yJIlt 


PI M tvVipl min tlipQ 

J. iCLLyil^illUll Lli^3 




G4VG04 


58-336 


f-t ni/'icm 1 n/~itii 1 l/~it /'I 
11 y lit u II in^iiiULitJiiiLiiii 


f^ti 1 n n ri ti 
v^lllUcll Id 


Am Oflfi 


VP no7i S487n 


92-417 


T'i'i/^nnninx' nnhntyftfii c 

J 1 l\.ltL/lJtll^\ LlHrlLll. 1 C I to 


PI 3 rnznn 






1 6-344 


A TYinPiiyno/inyi niipovi vlnnnim 

JLI 1 ll/l ilrflCLUJlt LI lAl^Cll^s ILlllLUL^Ll 


Pnn rpra 

1. VJl 11 L'l d 




VP n0338')744 


56-379 


Av/'ihti/inn^i^ tli/~i 1 1 /in/~i 

vT./ Hl/ILUJIJ^IA ItlLlllLlflLl 


0 11 ClJ LU Uliy la 


A01 A"?"? 




33-335 


Ad £loido^yfi£ incogiiitci 


iviiuu-upiiy Id. 


AOl OS'^ 




82^06 


Chlamydomonas reinhardtii 


Chlorophyta 


AOl. 096 


Q7XB41 


65-307, 490-578 


Phaeodactylum tricornutum 


Ochrophyta 




B7FZ37 


86-448 


Ectocarpus siliculosus 


Heterokontophyta 




D7FLX5 


93^07 


Phytophthora infestans 


Oomycota 




D0N6R0 


25-378 


Coprinus cinereus 


Basidiomycota 




A8N6S9 


143-366 


Saccharomyces cere visiae 


Ascomycota 


A01.018 


P07267 


78^05 


Rhizopus oryzae 


Zygomycota 




I1BX70 


57-254 


Batrachochytrium deudrohatidis 


Chytridiomycota 


A01.018 


F4NZG7 


69-399 


Dictyostelium discoideum 


Sarcoiiiastigophora 


A01.A89 


076856 


50-378 


Trichomonas vaginalis 


Parabasalidea 




A2FIM5 


44-351 



The identifiers for the sequences used to generate an alignment for family Al subfamily A are shown. Where no MEROPS identifier is listed, it 
is because a putative peptidase was used that could not be mapped to a MEROPS identifier. Accessions cited are mainly UniProt or RefSeq or 
are Protein Identifiers. The sequences from Capitella capitata and Meloidogyne incognita are translations from the genes Capcal_225009 
and Mincl2021, respectively. The residue range of the peptidase domain is given; in the case of Q7XB41, an unrelated nested domain interrupts 
the peptidase domain. 



The results from the HMMER searches returned more 
hits, but otherwise were consistent with the blastp searches 
in that all the hits found by blastp were also found by 
HMMER. The MEROPS filing pipeline was otherwise 
unchanged. Each sequence was submitted to a local 
blastp search against the MEROPS sequence collection, 
so that the extent of the peptidase domain and active 
site residues could be calculated and a MEROPS identifier 
could be assigned. 

If a peptidase or protein inhibitor family contained 
homologues from only one phylum, or contained only se- 
quences from viruses, then only a blastp search was 
performed. 

The methods for collecting homologues wiU change in 
the future because there is stiU a limit (20000 sequences) 
on the number of homologues returned by the HMMER 
search implemented on the HMMER web server. 

As can be seen from Table 1, the number of sequences 
in MEROPS has more than doubled since July 2011. We 
reported a similar doubhng in sequences between April 
2007 and August 2009 (14), but a more moderate 
increase between August 2009 and July 2011 (15). The 
most recent doubhng of sequences is partly due to the 
ability of HMMER searches to find additional distantly 
related homologues and also the increase in the number of 
completely sequenced genomes. 

MEROPS community input 

Table 1 shows that the number of peptidases that can be 
distinguished now exceeds 4000, each of which has been 
assigned a unique MEROPS identifier. Some of these 
identifiers have been set up for particular model organisms 
that have been the subject of genome sequencing projects. 



and the peptidase homologues have not yet been biochem- 
ically characterized (16). If these putative proteins are 
excluded, then the number of distinct biochemically 
characterized peptidases in release 9.9 is 2646. There is a 
computer-generated summary for each of these, showing 
the MEROPS classification, a figure showing the domain 
architecture and, if enough substrate cleavages are known, 
displays of specificity. In addition, there are pages for all 
orthologous proteins showing a dynamically generated 
alignment, a list of primary database cross-references 
(protein and nucleotide), a list of active site residues, a 
display of distribution amongst organisms, cross-refer- 
ences to entries in the Protein Data Bank (17,18) and a 
Richardson diagram (19) if a tertiary structure has been 
solved, a bibhography, a hst of substrates and their 
cleavage sites, a list of interactions with protein and 
smaU molecule inhibitors and cross-references to data- 
bases of pharmaceutical interest. There is, however, very 
httle text. 

MEROPS is run by a small team, and it is not possible 
for members of the team to write and maintain over 2600 
peptidase summaries. This is an ideal project for the wider 
scientific community. Community annotation projects 
have either made use of a centralized database such as 
Wikipedia, which is freely open to the general public or 
have used a system of registration so that only experts can 
contribute and the contribution is acknowledged. A suc- 
cessful example of a project using Wikipedia has involved 
the Rfani database of non-coding RNA sequences (20). A 
successful community annotation project that invites 
experts to contribute has been Reactome, which features 
biological pathways that include enzymes (21). We have 
chosen to follow the latter model. 



D506 Nucleic Acids Research, 2014, Vol. 42, Database issue 



The MEROPS community annotation project requires a 
consultant to register to receive a unique password. To log 
in, a consultant must provide an email address and 
password. The consultant is then presented with a list of 
MEROPS identifiers and their recommended names, which 
are the pages available to edit. Should a consultant wish to 
add a peptidase to his or her list, then he or she can request 
this. 

On clicking the 'edit' button, the consultant is presented 
with a (usually) blank form with the following headings: 
name and history, pH optimum, activity and specificity, 
RNA splicing, preparation, physiology, pharmaceutical 
relevance, biotechnology, biological aspects, subcellular 
location, knockout, distinguishing features, substrates 
(which links to the hst of known cleavages in substrates), 
inhibitors (which hnks to the hst of known peptidase/in- 
hibitor interactions), special substrate and special inhibi- 
tor. AU of these sections are available for editing, but some 
may contain text added by the MEROPS curators (espe- 
cially the physiology, pharmaceutical relevance, biotech- 
nology and knockout fields). A consultant is not expected 
to enter text for every field, and if no information is 
known the field is best left empty. 

When a consultant has completed his or her edits and 
wishes the summary to appear in the next release of 
MEROPS, then he or she can select 'Review Requested' 
in the 'Review stage' menu and then save the page. The 
MEROPS identifier is added to the hst of pages submitted 
for review, which is only visible to the curators. 



The MySQL database stores all saved versions of each 
section of each summary. The final summary presented to 
the administrator will be the most recently saved version 
of each section. Once reviewed by the administrator, the 
summary can be imported into the main MEROPS 
MySQL database. The curator adds the author details 
(names and affiliations) and the finished summary will 
appear in the next release of the MEROPS database. 
The administrator then resets the review stage to 
'Incomplete' and the summary is again available for the 
consultant to edit. An example of a completed summary is 
shown in Figure 1. 

Following the publication of the third edition of the 
Handbook of Proteolytic Enzymes (22), which contains 
chapters on over 800 peptidases, each written by one or 
more acknowledged experts, the authors of each chapter 
were invited to contribute to the MEROPS community 
input project in March 2013. To date, we have received 
over thirty summaries that have now been included on the 
MEROPS website. 

Recent developments 

Gene displays. Comparisons of the intron-exon structure 
of eukaryote genes have proved to be useful in under- 
standing their evolution. It had been noticed that within 
vertebrates, gene duplications frequently occurred after 
the insertion of introns, so that the exon/intron structure 
is preserved amongst paralogues. A theory for how 
regions of DNA coding for specific domains could be 



Editing summary for carboxypeptidase A6 



MEROPS Name carboxypeptidase A6(M14.018) 
Other names CPAH, Mem3me-M127 pepSdase (Homo sapiens) 



Name and history -^ 



Domein architecture 



MEROPS Classification 







Catalytic type ^ 
HC-IUBMB r 

Activity status n 
pH optimum^ ' 




Activity and specifjcity'-^ ' 




RNA splicing-i' 




Preparation J* 




Physiology 




1 

Pliaimaceutical relevance^ ' 




BiotechnokKiy^ 1 




Biological aspects^ 




Subcellular location'^ ' 

Knockout® ' 

f 
ti 




Disdnguistiing reatures® 


Substrates and Inhitirtors 



Classification ClanMC >> SuDdan (none) " Family M14 » Subfamily A » M14.018 

Holotype carboxypeptidase AS {Homo sapiens), Uniprot accession Q3N4T0 (peptidase ur 
History Identifier created. MEROPS 5.3 (4 December 2000) 



: 128-437), HERNUM HERO-13456 



a proenzyme in ttie secretory pathway and released into ttie extracellular space wtiere It is cleaved Into the mature acHve form and bound to Ihe extracellular mahlx in 



Substrates^ CPA6 cleaves hydrophobic residues from small dipeptide suhstrales and also large proteins (Lyons el a/.. 2010 : Lyons el a).. 20081 . 



ic add and potato carbojqrpepfidase in 



r [137.0011 . butDiase ^so inhibit other 



Special substrate^ 
Special inhibitor''^ I 



I I Save I 



Save and submltfor Review 



Review Stage; Accepted f ollowing review ^ 



Figure 1. Form for the submission of a peptidase summary for the MEROPS community annotation project. The summary for carboxypeptidase 
A6 (MEROPS identifier M14.018) is shown. The summary was kindly provided by Professor Lloyd Pricker. 



Nucleic Acids Research, 2014, Vol. 42, Database issue D507 



Summat7 for peptidase A28.001: DNA-damage inducible protein 1 



MEROPS Name DNA-damage inducible protein 1 
Other names DDII. Rings lost protein {Drosophila melanogasteO, Rngo protein {DrosopMa me/arjogasjer), vsl^t g p {.Saccharomyces cerewsiae) 



Domain architecture 



Classification ClanM » Subclan fnonel>a Family A28 " Subfamily (none) »A23.001 

Holotyiie DNA-damage inducible protein 1 {Saccl^aromyces cerevisiae), Uniprot accession P40087 (peptidase unit 210-324), MERNtJM MER030t)a4 
Hisloiy Identilier created: MEROPS 9 5 (1 July 2011) 



Catalytic type Aspartic 

NC-IUBMB Not yet included in lUBMB recommendations. 
Preparation Preparation of Saccfiaromyces cerewsiae Ddll protein in a baculovirus system was descnbed by Perteguer eta/, fPerteouer ef af, 2013 ), 
Inhibitor comments H\v proteinase innibitors snow different levels of inbibition in a complementation assay: Ddi1 vanants from different organisms also show different levels of innibition byttiese 
intiibitors (While etal . 2011 ), HIV proteinase inhibitors also inhibit recombinant enzyme (Peiteouerefaf 2013 ) 
Stnictute The tertiary structure of the Ddil protein from Saccharomyces cere\'isiae has been solved and the peptidase domain shows a fold very similar to that of retropepsin. The active form is 
a homodimer fSirkis etal.. 2006 ), The Asp-Gly-Thr-Ala motif around the active site aspanic acid is consetved between Ddil and retropepsins. The substrate binding groove in Ddil is 
/^ider allowing bulkier substrates to bind Additional domains that flank the peptidase domain are involved in the binding of ubiquitinated substrates and the proteasome and Ddil is 
also known as a 'ubiquitin receptor. There is an N-terminal ubiquitin-like (UBL) domain and a carbojcy-terminal ubiquitin-associated (UBA)domain (Sirkis ef af, 2006 ), The structure of 
the peptidase domain from human DD11 has also been solved [PDB entry 3S8I) 

Biological aspects The peptidase domain (RVP) is required for dimerlzatlon and the ubiquitin-like and ubiquitin-associated domains are required for checkpoint regulation, including rescue of the 
pdsl-128 checkpoint mutant and enncbmentof GFP-Odil in the nucleus Mutation of the active site Asp220 abolishes rescue of the pdsl-128 mutant but has no effect on 

merization The DBA domain is important for t-SNARE binding and undergoes phosphorylation on Thr346 andThr348 (Gabnely ef a), 2008 ) Ddi1 is involved in Iheturnover of a 
number of proteins including the the F-box protein Ufol F-box proteins bind the core SCF components of the E3 ubiquitin-protein ligases, which in turn control the cell cycle and cyclin 
degradation, Ufol is unique in containing a domain with multiple ubiquitin-interacting motifs, with whicti it interacts with Ddil, but only when Ufol is ubiquitinated Deleting these 
motifs increases the stability of ufol and arrests the cell cycle (Ivantsivet af, 2Q06 ), Ubiquitinated endonuclease Ho also binds Ddi1 and is then exponed from the nucleus to the 
cytoplasm where the complex binds to and is degraded by the proteasome. Ho is important for switching between yeast mating types (Kapiun efa) , 2005 ), Another binding padner 
and potential substrate of Ddil is PhoSIp, which is an inhibitor of the cyclin-cydin-dependent kinase (CDK) complex Pho80p-Pho85p Ddil and Rad23p protjably cooperative as 
negative regulators in the PHO pathway which regulates expression of phosphate-responsive genes such as PH05 encoding repressible acid phosphatase (Auesukaree etal.. 
2003 ), 

Knockout Ddil was inilatlly identilied as a negative regulator of constitutive exocytosis, because gene disruption leads to increased protein secretion ILuslgatten etaf , 1999 : White etal . 2011 ), 
Pharmaceutical relevance The enzyme from Le/sftmama parasites (and perhaps others) may be potential drug targets There is evidence that this enzyme is atargetfor HIV-protelnase inhibitors that are shown 
0 reduce Leistimanta infections (White ef af, 2011 ) 
Contributing aulhora Colin Berry, Cardiff School of Biosciences, Cardiff University, Park Place, Cardiff, CF10 3AT. UK 



Figure 2. Example of a complete peptidase summary. The summary for DNA-damage inducible protein 1 (MEROPS identifier A28.001) is shown. 
The summary was kindly supplied by Dr Colin Berry. 



shuffled between one gene and another was developed by 
Patthy (23). A new display to present gene structures has 
been added at the peptidase level. The display shows the 
known exon and intron structure for a eukaryote gene. An 
exon is shown as a box and is numbered. Introns are 
shown as the thick hne between the exons. The phase of 
the intron is indicated above the intron, where phase 0 
means the intron is inserted between codons, phase 1 
between the first and second base of the triplet and 
phase 2 between the second and third base of the triplet. 
All gene structures are taken from research articles where 
the structure was experimentally determined and are not 
taken from genome sequencing projects, where there may 
be problems with misidentification of exon-intron junc- 
tions, omission of exons and erroneous insertion of 
introns into coding sequence. The gene sequence displayed 
is from the initiation ATG to the stop codon, so introns 
within 5' and 3' untranslated regions are not shown. 
Alternatively spliced variants are shown where they have 
been experimentally proved to exist. Peptidase and protein 
inhibitor gene structures have been collected from the 
following eight model organisms: human, mouse, rat, 
Drosophila melanogaster, Caenorhabditis elegans, 
Arabidopsis thaliana, Saccharomyces cerevisiae and 
Schizosaccharomyces poinbe. An example of the new 
display is shown in Figure 3. 

Organism pages. It has become common practice to 
sequence the genomes of several different strains of the 
same bacterial species. The list of strains with completely 



sequenced genomes can now be displayed on the species 
page. Selecting one of the strains causes the results to be 
filtered, and only those peptidases or inhibitors present in 
that strain are displayed. It should be noted that the 
genome analysis at the foot of the page displays results 
for the selected strain and not the species. 

Peptidases from model organisms. The number of model 
organisms has been increased to 1 1 with the addition of a 
Gram-positive bacterium {Bacillus subtilis), an archaean 
(Pyrococcus furiosus), a protozoan {Dictyostelium 
discoideum) and another yeast {Schizosaccharomyces 
cerevisiae). A special MEROPS identifier, in which the 
first character after the dot is A, B or C, has been 
created for each putative peptidase from each of these 
organisms. 

Literature. Links are now being presented to Europe 
PubMed Central and PubMed. 

A new item has been added to the search menu that 
aUows a user to retrieve references by submitting a 
simple text search. A user can enter an author name, a 
term from a title or a journal name. The retrieved hst 
displays the fuU reference with, where available, hnks to 
PubMed, PubMed central, the full text of the article and 
clan, family, peptidase or inhibitor summaries in 
MEROPS. 

Peptidase families and identifiers. There have been two 
significant developments concerning peptidase family 
names and MEROPS identifiers. 



D508 Nucleic Acids Research, 2014, Vol. 42, Database issue 



Gene Structures for A01.010 




Figure 3. Example of a gene structure. The gene structures for cathepsin E (MEROPS identifier AOl.OlO) are shown. 



The recent crystal structure of the precursor of the 
pantetheiiiyl hydrolase ThnT from Streptomyces cattleya 
(24) has shown that auto-activation exposes a threonine at 
the new N-terminus, occupying the same position as a 
serine in the homologous aminopeptidase DmpA from 
Ochrobactrum anthropi. This means that the nucleophile 
in peptidases in this family can be either threonine or 
serine. In all other known families of peptidases, the nu- 
cleophile is absolutely conserved. This means that the 
family cannot be named according to the convention 
used so far in MEROPS in which the first letter of the 
family name represents the nature of the nucleophile. This 
family has been named PI, which is the first in a new 
category of families with mixed nucleophiles. 

The first family to be assigned an identifier with three 
digits is the cysteine peptidase family ClOl, with includes 
the FAM105B (or OTULIN) isopeptidase (ClOl.OOl). 
This is a de-ubiquitinating enzyme that is specific for 
Metl linkages (25). 



ACKNOWLEDGEMENTS 

The authors would like to thank the following: the authors 
who have contributed peptidase summaries to the commu- 
nity annotation project; Matthew Jenner and Danielle 
Weaver for help with testing the software for this 



project; Pfam and Rfam colleagues for helpful discussions, 
especially John Tate for help with displays; Paul Bevan 
from the Sanger Institute web team for all his help in 
maintaining this resource; and Molecular Connections 
(Bangalore, India) who have been used to collect substrate 
cleavages from the scientific hterature. They would also 
like to thank those users who have pointed out errors 
and omissions or those who have suggested changes and 
improvements. 



FUNDING 

Wellcome Trust [WT0077044/Z/05/Z]. Funding for open 
access charge: Wellcome Trust. 

Conflict of interest statement. None declared. 



REFERENCES 

1. Rawlings,N.D., Barrett,A-J- and Bateman,A. (2011) Asparagine 
peptide lyases: a seventh catalytic type of proteolytic enzymes. 
J. Biol. Chem., 286, 38321-38328. 

2. Rawlings,N.D. and Barrett,A.J. (1993) Evolutionary families of 
peptidases. Biochem. J.. 290, 205-218. 

3. Rawlings,N.D., Tolle.D.P. and Barrett,A.J. (2004) Evolutionary 
families of peptidase inhibitors. Biochem. J., 378, 705-716. 



4. VizcainoJ.A., Cote,R.G., Csordas,A., DianesJ.A., Fabregat,A., 
FosterJ.M., Griss,J., Alpi,E., Birim.M., ContellJ. et al. (2013) 
The PRoteomics IDEntifications (PRIDE) database and 
associated tools: status in 2013. Nucleic Acids Res., 41, 
D1063-D1069. 

5. Colaert,N., Maddelein,D., Impens,F., Van Damme.P., 
Plasman,K., Helsens,K., Hulstaert,N., VandekerckhoveJ., 
Gevaert,K. and Martens,L- (2013) The Online Protein Processing 
Resource (TOPPR): a database and analysis platform for 
protein processing events. Nucleic Acids Res., 41, D333-D337. 

6. Igarashi,Y., Eroshkin,A., Gramatikova,S., Gramatikoff,K., 
Zhang,Y., Smith,J.W., Osterman.A.L. and Godzik,A. (2007) 
CutDB: a proteolytic event database. Nucleic Acids Res., 35, 
D546-D549. 

7. Rawlings,N.D. (2009) A large and accurate collection of 
peptidase cleavages in the MEROPS database. Database, 2009, 
bap015. 

8. Akschul,S.F., Madden.T.L., Schaffer,A.A., Zhang,J., Zhang.Z., 
Miller,W. and Lipman.D.J. (1997) Gapped BLAST and PSI- 
BLAST: a new generation of protein database search prograins. 
Nucleic Acids Res., 25, 3389-3402. 

9. NCBI Resource Coordinators. (2013) Database resources of the 
National Center for Biotechnology Information. Nucleic Acids 
Res., 41, D8-D20. 

10. Barrett,A.J., Rawhngs.N.D. and 0'Brien,E.A. (2001) The 
MEROPS database as a protease information system. 

/. Struct. Biol., 134, 95-102. 

1 1. Barrett,A.J. and Rawhngs,N.D. (2007) 'Species' of peptidases. 
Biol. Chem., 388, 1151-1157. 

12. Finn,R.D., Clements,;, and Eddy,S.R. (2011) HMMER web 
server: interactive sequence similarity searching. Nucleic Acids 
Res., 39, W29-W37. 

13. Larkin,M.A., Blacksliields,G., Brown,N.P., Chenna,R., 
McGettigan,P.A., McWilham,H., Valentin,F., Wallace,I.M., 
Wilm,A., Lopez,R. et al. (2007) Clustal W and Clustal X version 
2.0. Bioinformalics, 23, 2947-2948. 

14. Rawlings,N.D., Barrett,A.J. and Bateman,A. (2010) 
MEROPS: the peptidase database. Nucleic Acids Res., 38, 
D227-D233. 



Nucleic Acids Research, 2014, Vol. 42, Database issue D509 



15. Rawhngs,N.D., Barrett,A.J. and Bateman,A. (2012) MEROPS: 
the database of proteolytic enzymes, their substrates and 
inhibitors. Nucleic Acids Res., 40, D343-D350. 

16. Rawhngs,N.D. (2013) Identification and prioritization of novel 
uncharacterized peptidases for biochemical characterization. 
Database, 2013, bat022. 

17. Rose,P.W., Bi,C., Bluhm.W.F., Christie,C.H., Dimitropoulos.D., 
Dutta,S., Green,R.K., Goodsell,D.S., Prlic,A., Quesada,M. et al. 
(2013) The RCSB Protein Data Bank: new resources for research 
and education. Nucleic Acids Res., 41, D475-D482. 

18. Rose,P.W., Beran,B., Bi,C., Bluhm,W.F., Dimitropoulos,D., 
Goodsell,D.S., Prlic,A., Quesada,M., Quinn,G.B., Westbrook,J.D. 
et al. (2011) The RCSB Protein Data Bank: redesigned web site 
and web services. Nucleic Acids Res., 39, D392-D401. 

19. Richardson,J.S. (1985) Schematic drawings of protein structures. 
Methods EnzymoL, 115, 359-380. 

20. Daub,;., Gardner,P.P., Tate,J., Ramskold,D., Manske,M., 
Scott,W.G., Weinberg,Z., Griffiths-Jones, S. and Bateman,A. 
(2008) The RNA WikiPrqject: community annotation of RNA 
families. RNA, 14, 2462-2464. 

21. Croft,D., 0'Kelly,G., Wu,G., Haw,R., GillespicM., Matthews,!., 
Caudy,M., Garapati,P., Gopinath,G., Jassal,B. et al. (2011) 
Reactome: a database of reactions, pathways and biological 
processes. Nucleic Acids Res., 39, D691-D697. 

22. Broadbent,J.R. and Steele,J.L. (2013) Lactocepin: the cell 
envelope-associated endopeptidase of lactococci. 

In: RawlingSjN.D. and Salvesen,G.S. (eds). Handbook of 
Proteolytic Enzymes. Elsevier, Amsterdam, pp. 3188-3195. 

23. Patthy,L. (1985) Evolution of the proteases of blood coagulation 
and fibrinolysis by assembly from modules. Cell, 41, 657-663. 

24. Buller.A.R., Freeman,M.F., Wriglit,N.T., Schildbach,J.F. and 
Townsend,C.A. (2012) Insights into cis-autoproteolysis reveal 

a reactive state formed through conformational rearrangement. 
Proc. Natl Acad. Sci. USA, 109, 2308-2313. 

25. Keusekotten,K., Eniott,P.R., Glockner,L., Fiil.B.K., 
Damgaard,R.B., Kulathu,Y., Wauer,T., Hospenthal,M.K., Gyrd- 
Hansen,M., Krappmann,D. et al. (2013) OTULIN Antagonizes 
LUBAC signaling by specifically hydrolyzing Met 1 -linked 
polyubiquitin. Cell, 153, 1312-1326. 



